NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

https://doi.org/10.18653/v1/2025.acl-long.114

Ge, Huaizhi; Li, Yiming; Wang, Qifan; Zhang, Yongfeng; Tang, Ruixiang (October 2025, Association for Computational Linguistics)

Free, publicly-accessible full text available October 3, 2026
TrustAgent: Towards Safe and Trustworthy LLM-based Agents

https://doi.org/10.18653/v1/2024.findings-emnlp.585

Hua, Wenyue; Yang, Xianjun; Jin, Mingyu; Li, Zelong; Cheng, Wei; Tang, Ruixiang; Zhang, Yongfeng (November 2024, Association for Computational Linguistics)

Full Text Available
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion

Wang, Guanchu; Chuang, Yu-Neng; Tang, Ruixiang; Zhong, Shaochen; Yuan, Jiayi; Jin, Hongye; Liu, Zirui; Chaudhary, Vipin; Xu, Shuai; Caverlee, James; et al (November 2024, The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP))

Full Text Available
Towards Debiasing DNN Models from Spurious Feature Influence

https://doi.org/10.1609/aaai.v36i9.21185

Du, Mengnan; Tang, Ruixiang; Fu, Weijie; Hu, Xia (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Recent studies indicate that deep neural networks (DNNs) are prone to show discrimination towards certain demographic groups. We observe that algorithmic discrimination can be explained by the high reliance of the models on fairness sensitive features. Motivated by this observation, we propose to achieve fairness by suppressing the DNN models from capturing the spurious correlation between those fairness sensitive features with the underlying task. Specifically, we firstly train a bias-only teacher model which is explicitly encouraged to maximally employ fairness sensitive features for prediction. The teacher model then counter-teaches a debiased student model so that the interpretation of the student model is orthogonal to the interpretation of the teacher model. The key idea is that since the teacher model relies explicitly on fairness sensitive features for prediction, the orthogonal interpretation loss enforces the student network to reduce its reliance on sensitive features and instead capture more task relevant features for prediction. Experimental analysis indicates that our framework substantially reduces the model's attention on fairness sensitive features. Experimental results on four datasets further validate that our framework has consistently improved the fairness with respect to three group fairness metrics, with a comparable or even better accuracy.
more » « less
Full Text Available
Mitigating Gender Bias in Captioning Systems

https://doi.org/10.1145/3442381.3449950

Tang, Ruixiang; Du, Mengnan; Li, Yuening; Liu, Zirui; Zou, Na; Hu, Xia (April 2021, WWW '21: Proceedings of the Web Conference 2021)

Full Text Available
Fairness via Representation Neutralization

Du, Mengnan; Mukherjee, Subhabrata; Wang, Guanchu; Tang, Ruixiang; Awadallah, Ahmed Hassan; Hu, Xia (January 2021, 2021 Conference on Neural Information Processing Systems)

Full Text Available
Understanding Social Biases Behind Location Names in Contextual Word Embedding Models

https://doi.org/10.1109/TCSS.2021.3106003

Wu, Fangsheng; Du, Mengnan; Fan, Chao; Tang, Ruixiang; Yang, Yang; Mostafavi, Ali; Hu, Xia (January 2021, IEEE Transactions on Computational Social Systems)

Full Text Available

Search for: All records